-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18708: Support S3 Client Side Encryption(CSE) With AWS SDK V2 #6884
Conversation
@steveloughran @ahmarsuhail Could you please review the changes ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, no time to review right now, vector IO problems
I don't want another s3 client as it will only hurt boot time. are already hitting problems here where client init time is O(jar) & I'm thinking of lazy creation of the async one.
look at how we did this before: we altered file lengths in listings.
- no to netty dependencies
- no to any dependency on the new library except when CSE is enabled.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java
Outdated
Show resolved
Hide resolved
FYI, Jira on create performance. I'm worrying about how to remove/delay client configuration, so adding a new client makes things way worse: HADOOP-19205 |
by default we are not initializing new s3 client. It is done under a config to provide backward compatibility only when required. |
It's still happening in FileSystem.initialize(). If I get time this week I'll do my PoC of lazy creation of transfer manager and async client, as that's doubled startup time already. all of that will be moved behind S3Store with only the copy() methods exposed. Ultimately I want the S3Client itself hidden behind that API, so here
Actually, |
#6892 is the draft pr with lazy creation; a new s3 client would go in there, but ideally we'd actually push the listing/transformation code into S3AStore so the extra client would never be visible outside it. |
@steveloughran - I really like the idea of lazy initializing async client and new interface for creating s3 client. I will wait for your changes to get in and refactor my code based on it. By this way i could lazy initialize unencryted s3 client as well when the config is enabled. |
exactly. when its not needed: no penalty. |
@steveloughran @ahmarsuhail I have rebased the changes on top of #6892 . Thanks |
2593281
to
bc9cfbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(intellij seems to be hiding or has lost some of the review comments...let me see once this review is submitted
)
I'm afraid one thing I now want changed across the entire code base is to stop passing s3 client instances around, so that we can move in future to using the AsyncClient everywhere. The S3AStoreImpl class would provide accessor methods to all operations needed, s3afs would invoke them, while subsidiary components/classes would invoke S3Astore via callbacks which would never call to S3AFS. See BulkDeleteOperationCallbacksImpl for an example.
As well as delivering the alleged speed ups of the async client, the isolation should help testing, maintenance etc. My ultimate goal would be for the S3AFS class to to be a lot more minimal, with no Callback ever been implemented in it.
Anyway, that is a big undertaking. For now: no new accessors of s3client. And no network IO operations in S3AUtils, which should all ready be small utility methods (which is also why error handling is moving to )
Proposed:
- ListingOperationCallbacks to provide a createFileStatus() call to create a file status
- different implementations/branches of that call to eiher the S3AUtils class, or something which does the extra IO
this is going to have to be pulled out into listing callbacks, executed within the same auditspan and updating statistics. if a client is doing any IO we need to know who and why (server side) and how long (client)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSHeaders.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManagerImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManagerImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManagerImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/CSEMaterials.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManagerImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/CSEUtils.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/HeaderProcessing.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManagerImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java
Show resolved
Hide resolved
@steveloughran I really appreciate your time and effort to review this changes. Indeed it was a detailed review. Please take a look Thanks |
b2207a4
to
3b31b5a
Compare
@shameersss1 unless you have no choice, please try not to rebase and force push prs once reviews have started. makes it a lot harder to see what has changed |
@@ -727,6 +736,28 @@ S3-CSE to work. | |||
</property> | |||
``` | |||
|
|||
#### 2. CSE-CUSTOM | |||
- Set `fs.s3a.encryption.algorithm=CSE-CUSTOM`. | |||
- Set `fs.s3a.encryption.cse.custom.cryptographic.material.manager.class.name=<fully qualified class name>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the new fs.s3a.encryption.cse
properties be added to core-default.xml
and index.md
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These configurations will be shown as part of the encryption page. similar to https://hadoop.apache.org/docs/r3.4.0/hadoop-aws/tools/hadoop-aws/encryption.html
Yeah, I had to rebase for merge conflicts in hadoop-common module. This commit (838dc24) contains the addressing of review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there is a little bit of complexity in S3AFileSystem with if (isCSEEnabled)
conditions in many places. I wonder if, instead of repeating the boolean checks all over the code, we could have a handler interface and extract the code to two implementations: one for isCSEEnabled and one for no CSE.
Other than that, the change looks good to me.
@steveloughran - Gentle reminder for the review |
@shameersss1 what do you think of @raphaelazzolini 's comment |
Thinking about this some more: those .instruction files raise a lot of problems; I don't like the way t There's already the notion that files beginning with . or _ are hidden, applications scanning directo So do we really need to hide them at all? Making them visible stops us playing yet more tricks with the What is actually in the files? Is it just some list of attributes encrypted with the same CSE settings? |
Just thinking if there is any better way to solve this. or may be we can simply quote ".instruction" files are not supported and should not be present. @steveloughran anythoughts on this ? |
37eead5
to
1deb43a
Compare
Force pushed to rebase with the trunk. @steveloughran - Thanks a lot for a detailed review. I know it is a pain to review changes which are big (~50 files). I really appreciate the time you put into this. I have addressed your comments with the new commit PS: Ignored instruction file handling for simplicilty |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@steveloughran - raised a revision by addressing the comments and checkstyle issue ifx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Last big bit is the exception translation.
Let's try getting this in this week!
configureClientBuilder(S3AsyncClient.builder(), parameters, conf, bucket) | ||
.httpClientBuilder(httpClientBuilder); | ||
|
||
// TODO: Enable multi part upload with cse once it is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a followup JIRA and reference it here "multipart upload pending with HADOOP-xyz"
/** | ||
* An interface that defines the contract for handling certain filesystem operations. | ||
*/ | ||
public interface S3AFileSystemHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
|
||
import java.io.IOException; | ||
|
||
import org.apache.hadoop.conf.Configuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, put block below the "other" package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
public ResponseInputStream<GetObjectResponse> getObject(S3AStore store, GetObjectRequest request, | ||
RequestFactory factory) throws IOException { | ||
boolean isEncrypted = isObjectEncrypted(store.getOrCreateS3Client(), factory, request.key()); | ||
return isEncrypted ? store.getOrCreateS3Client().getObject(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aah.
import java.io.IOException; | ||
import java.net.URI; | ||
|
||
import org.apache.hadoop.conf.Configuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move block below the software. one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
...ls/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AClientSideEncryptionCustom.java
Show resolved
Hide resolved
String xAttrPrefix = "header."; | ||
|
||
// Assert KeyWrap Algo | ||
assertEquals("Key wrap algo isn't same as expected", KMS_KEY_WRAP_ALGO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertJ assertThat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
* tests. | ||
* @param conf Configuations | ||
*/ | ||
private static void unsetEncryption(Configuration conf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to move this to S3ATestUtils and use elsewhere, unsetting all encryption options? Then use in ITestS3AClientSideEncryptionCustom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes make sense!
Class exceptionClass = AccessDeniedException.class; | ||
if (CSEUtils.isCSEEnabled(getEncryptionAlgorithm( | ||
getTestBucketName(getConfiguration()), getConfiguration()).getMethod())) { | ||
exceptionClass = AWSClientIOException.class; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show me the full stack? 403 normally maps to AccessDeniedException, and it'd be good to keep the same. AWSClientIOException is just our "something failed" if there's nothing else
- Add
maybeTranslateEncryptionClientException()
inErrorTranslation
to look at the exception, if the classname string value matches "software.amazon.encryption.s3.S3EncryptionClientException" then map to AccessDeniedException - call that to S3AUtils.translateException just before the
// no custom handling.
bit
It may seem bad, but look at maybeExtractChannelException()
and other bits to see worse.
@@ -156,7 +159,12 @@ public void testGeneratePoolTimeouts() throws Throwable { | |||
ContractTestUtils.createFile(fs, path, true, DATASET); | |||
final FileStatus st = fs.getFileStatus(path); | |||
try (FileSystem brittleFS = FileSystem.newInstance(fs.getUri(), conf)) { | |||
intercept(ConnectTimeoutException.class, () -> { | |||
Class exceptionClass = ConnectTimeoutException.class; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't be needed once the translation is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
let's make this the next big change, with everything else blocked before it goes in. That way, no more merge pain |
🎊 +1 overall
This message was automatically generated. |
@steveloughran - Thanks a lot for the review. I have addressed your comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; I like the translation now and you can see from the tests how it makes invocation a lot more consistent. Yes -the error translation code is very complicated. But consider this: if we had not remapped all v1 SDK exceptions to IOEs before throwing them, it would have been nearly impossible to upgrade to the V2 SDK without breaking so much code downstream.
keyring = | ||
getKeyringProvider(cseMaterials.getCustomKeyringClassName(), cseMaterials.getConf()); | ||
} catch (RuntimeException e) { | ||
throw new IOException("Failed to instantiate a custom keyring provider", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you include the error text of e in the new IOE; it's really easy to lose the base exception in log stack chains with deep wrapping/rewrapping.
IOException("Failed to instantiate a custom keyring provider: " + e, e)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense!
ack.
* @see SdkException | ||
* @see AwsServiceException | ||
*/ | ||
public static SdkException maybeExtractSdkExceptionFromEncryptionClientException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit long. How about maybeProcessEncryptionClientException()
this says it is handled, without saying exactly what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack.
return ReflectionUtils.newInstance(keyringProviderClass, conf, | ||
new Class[] {Configuration.class}, conf); | ||
} catch (Exception ex) { | ||
throw new RuntimeException("Failed to create Keyring provider", ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about including the wrapped exception text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@steveloughran Yeah, Error translation feature make sense now. I have addressed your comments. |
🎊 +1 overall
This message was automatically generated. |
@steveloughran - Gentle reminder for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
ok, merged. provide a 3.4 version and I'll pull that in too. |
Sure, will work on branch-3.4 after HADOOP-19336 |
Followup to HADOOP-18708: S3A: Support S3 Client Side Encryption(CSE) (#6884) Contributed by Syed Shameerur Rahman
Description of PR
This commit adds support for S3 client side encryption (CSE). CSE can configured in two modes CSE-KMS where keys are provided by AWS KMS and CSE-CUSTOM where custom keys are provided by implementing custom keyring
CSE is implemented using S3EncryptionClient (V3 client) and additional configurations (mentioned below) were added to make it compatible with the older encryption client V1 and V2 which is turned OFF by default.
Inorder to have compatibility with V1 client the following operations are done.
Default Behavior
The configurations to make it backward compatible is turned OFF by default considering the performance implications. The default behavior is as follows
This PR is based on the initial work done by @ahmarsuhail as part of #6164
How was this patch tested?
mvn -Dparallel-tests -DtestsThreadCount=16 clean verify
.